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[57] ABSTRACT 

A digital camera is used to scan documents and generate 
a corresponding digital output signal. A data processor 
receives the digital output signal and generates corre- 
sponding index information. The video and index infor- 
mation are then stored on one or more optical disks for 
search and retrieval. 

9 Claims, 1 Drawing Sheet 
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INFORMATION STORAGE AND RETRIEVAL 
SYSTEM 

BACKGROUND OF THE INVENTION 5 

This invention is directed to information storage and 
retrieval, and more particularly to a system for storage 
and retrieval of large quantities of documents which 
may include text, illustrations or combinations thereof. 
The invention is particularly useful in the archival stor- 10 
age of historical documents wherein it is desirable to 
maintain the integrity of the historical document, in- 
cluding its original appearance. As used herein and in 
the appended claims, the term "archival document shall 
be used to refer to a document containing textual infor- 13 
maticn but wherein the appearance of the original docu- 
ment, and not merely the content of the text, is signifi- 
cant, and an "archival document storage system" or 
"archival document image storage and retrieval sys- 
tem" shall be used to refer to a system for storing and 20 
retrieving images of archival documents where the 
appearance of the original document is of interest to the 
user of the system. 

In a conventional archival storage system, documents 
would typically be separated into files. In some cases, 25 
large quantities of documents would be stored with no 
practical indexing, so that reasonable access to the doc- 
uments would not be available to any one but a few 
scholars who knew where to find them. An indexing 
system could be manually generated and researchers 30 
could access the stored documents through an index 
card file, but manual indexing systems have not proven 
entirely satisfactory. 

When using an index card file system, it is still neces- 
sary to retrieve a document from the storage files in 35 
order to determine if it is relevant If relevant, it is then 
necessary to obtain a photocopy or other reproduction 
of the document. These processes can take considerable 
time where a large number of documents are involved. 

The handling of the documents contributes to the 40 
deterioration of the documents, which can be a long 
term problem in an archival storage system. 

Still further, the complexity of manually generating 
an index card file system can itself be a disincentive for 
maintaining such a system when extremely large num- 45 
bers of documents are involved. 

It is desirable to provide some type of automated 
search capability, and it is known in some systems to 
index documents, e.g., by key words, and to permit 
automated searching. However, this facilitates only the 50 
searching aspect of the conventional system described 
above, and it is necessary to manually retrieve docu- 
ments, to take the document to a photocopy station to 
obtain a copy of the document, and to manually gener- 
ate the key words and phrases which will be used in the 55 
indexing system. 

In at least one publicly available storage and retrieval 
system, Lc, the automated search system currently 
maintained by the U.S. Patent and Trademark Office 
for searching U.S. patents related to data processing, 60 
pertinent portions of the documents covered by the data 
base are stored on microfiche. Each document must be 
read by Patent Office personnel who will then assign 
that document to one or more descriptive headings. A 
system user can then key in a particular heading or a 65 
plurality of headings combined with logical operators, 
and the system will display from microfiche the stored 
portions of every document satisfying the search re- 
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quest. While such a system represents a substantial im- 
provement over entirely manual systems, it is still not 
entirely satisfactory in a document storage and retrieval 
system employing very large numbers of documents, 
e.g., many millions of pages of text and drawing. The 
microfiche storage capacity is insufficient for such large 
numbers of documents, and the speed of retrieving the 
appropriate microfiche for display would also be unsat- 
isfactory in a system of great size. Further, the pages of 
documentation are recorded on microfiche by a con- 
ventional photographic process, and there is no oppor- 
tunity for the system to recognize the content of the 
documents being photographed. AH key words and 
descriptive headings must therefore be manually en- 
tered. Still further, it is necessary for each viewing 
station to have its own set of microfiche, or at least for 
all viewing stations to be located immediately adjacent 
the microfiche file. 

SUMMARY OF THE INVENTION 

It is an object of the present invention to provide a 
storage and retrieval system which overcomes the 
above-described drawbacks. 

It is a further object of this invention to provide such 
a system which is capable of storing very large numbers 
of documents in an efficient manner. 

It is a still further object of this invention to provide 
such a system which will permit rapid automated 
searching of the stored documents, retrieval of the doc- 
uments for display, and on-site printing of the docu- 
ments while maintaining the integrity of the appearance 
of the original document 

It is yet a further object of this invention to provide a 
document storage and retrieval system wherein the 
requirement for manual data entry is minimized to sub- 
stantially ease the burden of generating the desired 
document index. 

It is a further object of this invention to provide a 
document storage and retrieval system wherein a re- 
mote user can access the data base for storage, retrieval 
and printing of documents including both printed text 
and drawings. 

Briefly, these and other objects of the present inven- 
tion are achieved by an archival mass storage and re- 
trieval system using a digital camera to capture data in 
a digital form. The output of the digital camera is pro- 
vided as an input to a general purpose digital computer 
which, in turn, is provided with an optical disk storage 
system. The digital information representing the object 
scanned by the digital camera is stored for subsequent 
retrieval on the optical disk together with appropriate 
index information. 

The digital camera can thus be used to scan photo- 
graphs, blueprints, pages of books, memos, etc., or even 
three-dimensional objects, and can accurately transform 
any of these into data which can be subsequently han- 
dled by the computer system. With all data in digital 
form, the documents can be transmitted to or from 
remote locations. The computer is preferably also pro- 
vided with software necessary for recognizing text con- 
tained in the bit-map output of the digital camera so 
that, upon recognizing the content of any document, the 
appropriate indei information can be automatically 
generated. The captured images and associated index 
information are then stored on an optical disk for future 
searching and retrieval. Retrieved documents can be 
displayed on the computer monitor and can be repro- 
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duced on any commercially available printer having 
graphics capability. 

BRIEF DESCRIPTION OF THE DRAWING 

The invention will be more clearly understood from S 
the following description in conjunction with the ac- 
companying drawing, wherein the FIG. 1 of the draw- 
ing is a block diagram of essential components of the 
storage and retrieval system according to the present 
invention, and FIG. 2 is a brief flow chart illustrating an 10 
efficient procedure for document capture and indexing. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENT 

A block diagram of the overall system of the inven- 13 
tion is shown in the accompanying Figure. The system 
includes a digital camera 10 for scanning documents and 
converting the image information of each document to 
a digital information stream. The digital output of the 
camera 10 is provided to a processor 12, which may be 20 
any one of a number of suitable general purpose digital 
computers. The processor 12 receives the image infor- 
mation and preferably stores this information in a buffer 
storage, e.g., disk storage device 14. The processor 12 
then generates the appropriate indexing information, 25 
with the help of additional data manually entered via 
keyboard 20, if necessary, and the multiple frames of 
image information together with the associated index 
information are stored on a digital storage medium in a 
manner to be described in more detail below. In a pre- 30 
f erred embodiment of the invention, the digital storage 
medium will be an optical disk, although other types of 
high density storage devices would be acceptable. The 
index stored on the optical disk can then be searched by 
the processor 12 to obtain the addresses of frames hav- 35 
ing requested information. The disk locations can then 
be accessed to retrieve the video information for display 
on a display 16. If desired, the displayed video informa- 
tion can be reproduced via printer 18. 

The individual components and their interaction will 40 
now be described in more detail. 

With regard first to the digital camera 10, the camera 
should be a high resolution digital camera, such as a 
scanning-type CCD camera having a resolution of on 
the order of 3000 X 1500 pixels. One example of a suit- 43 
able camera for use in this invention is the Model 610 
camera available from Data Copy of Mountain View, 
California. The camera is available with a suitable soft- 
ware routine to permit the processor 12 to receive and 
process the digital output from the camera for either 30 
storage, display, printing or retransmission, e.g., to an 
optical disk recorder. 

The video information provided by the camera 10 
may represent a typewritten or handwritten document, 
a blueprint, photograph or even a physical object, and 33 
the output of the camera 10 will be a digital bit stream 
provided to the processor 12. The processor 12 may be 
any one of a number suitable general purpose digital 
computers, such as the PC XT microcomputer available 
from International Business Machines Corporation. The 60 
processor stores the digital video information in a disk 
buffer storage device 19. In the case of a typewritten 
document, the processor may be provided with soft- 
ware for converting the bit-mapped video information 
into ASCII character data, so that the processor can 65 
determine the information content of the text Having 
determined the information content of the text, the 
processor employs appropriate software to generate the 



index and cross referencing information which can also 
be stored in buffer 14. The software should preferably 
be a self-index software program which will generate a 
data base index from full text, so that every word in the 
text will become a key word in the index. A suitable 
indexing software program would be the ZylNDEX 
program available from ZyLAB Corp. of Chicago, 
Illinois. 

The text recognition software for recognizing text 
and generating ASCII characters from bit mapped 
image data is presently available and/or can be gener- 
ated from commercially available software in a very 
straightforward manner. For example, commercially 
available OCR software in effect examines the bit map, 
or pattern of pixels, of the character image as a scanner 
moves across a page. Thus, the scanner momentarily 
"captures" the image of each character and recognizes 
each captured character image before going on to the 
next character. To recognize in software the characters 
represented by a captured digital image of an entire 
document, it is merely necessary to scan the digital 
image in the same manner as the original document 
would have been optically scanned by the OCR device. 
For example, the captured image could be displayed on 
a monitor, and a cursor, which may preferably cover 
substantially the same area as would be covered by an 
OCR scanner, may be moved across the monitor screen 
simulating the same scanning motion as an OCR scan- 
ner. The image portion covered by the cursor would be 
recognized by the OCR software in the same manner as 
is conventionally done. One example of OCR software 
suitable for modification for use in the present invention 
is the software used in the OMNI- READER optical 
character reader available from Oberon International 
having offices in Irving, Texas and London, England 
The only modification necessary to this software would 
be to substitute the digital image information, from a 
region of the image covered by a cursor, for the image 
information signal which would normally be provided 
to the software from the scanner output. 

Alternatively, software for recognizing text from bit 
mapped image data is available from Bell Northern 
Research, P.O. Box 3511, Station C, Ottawa, Canada. 

If it is desired to utilize in the indexing routine addi- 
tional key words for a particular document which are 
not found in the recognized text, such additional key 
words, e.g., category headings, titles, etc., can be en- 
tered via keyboard 20. For documents which are draw- 
ings or other images which cannot be recognized by the 
processor 12 or if the document is a typewritten docu- 
ment which has deteriorated to such an extent which 
precludes machine recognition, appropriate key words 
and/or titles will be entered via keyboard 20 for use in 
the indexing routine. 

As described above, the preferred embodiment of this 
invention will use an optical disk storage, and the inven- 
tion will be described in this context However, it 
should be appreciated that other storage devices could 
be used, and that the invention is not limited to use with 
optical disk storage but is instead limited only by the 
language of the appended claims. The least expensive 
technique for optical disk recording is to generate a 
master disk from 1 " C tape, with the image information 
being represented by analog signals. Accordingly, the 
digital information from the buffer storage 14 can be 
provided in the form of digital signal to a translator 22, 
e.g., such as available from Matrox Electronics Sys- 
tems, Ltd. of Montreal, Canada, which preferably per- 
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forms a simple D/A conversion to provide an analog - efficient procedure would be to successively capture 
video signal at its output The analog video signal out- the documents and to generate an optical disk, or indeed 
put from the translator 22 can then be recorded on a plurality of optical disks, storing only the captured 
standard video tape 24, e.g., of the type well known in image information from the documents. After storing 
the art The indexing information would then be simi- 5 the digital image information, the images can be re- 
larly provided through the translator 16 for recording trieved from the disk players and the index information 
on the tape 24. The order in which the video informa- can be generated at that time either by text recognition 
tion and index information are provided for recordation software or by entry of index information via keyboard 
on the tape 24 could be reversed, but it is preferable that 20, and the image data together with the corresponding 
all of one type of information be recorded followed by 10 index information can then be stored on a new set of 
the other type rather than interleaving video and index optical disks. This operation is briefly illustrated in the 
data. The tape would then be used to generate a master flow chart of FIG. 2. 

disk in a disk mastering station 26, eg., available from Thus, since the digital image information can be rec- 
Phillips. With the video and indexing information re- ognized later just as easily as at the time of original 
corded on different areas of the disk, it is a simple matter 15 capture, the image capturing process need not be de- 
to designate certain track numbers as index storage area layed while the indexing software and text recognition 
to facilitate accessing of the index information. software are operating. Indeed, with the ability to rec- 

Since the index information is in ASCII format rather ognize and index the captured images at a later time, it 
than bit-mapped image, it would also be possible to is possible to capture all of the document images on one 
record the index information in digital form on the tape 20 processor while performing text recognition and index- 
24 and master disk. In such a case, the index information ing on another processor,, thus subtantially enhancing 
could be provided through the translator 22 without the throughput speed. 

conversion, and the separation between image and Since constant changing of the optical disks may be 
index information on the optical disk could be indicated impractical, the storage capacity may be increased ei- 
merely by noting the track number above or below 25 ther by using a multiple-disk player or by stacking a 
which a digital signal format is employed. plurality of disk players and separately addressing each 

After generating a desired number of disk copies, e.g., player. A suitable multiple-disk player is available from 
plural copies for distribution to remote accessing sta- Mitomo Corporation and provides an expanded storage 
tions, a recorded optical disk may be placed in an opti- capacity of on the order of 50 individual disks. How- 
cal disk playback device 28. This may, for example, 30 ever, accessing speed would be improved by utilizing a 
comprise a standard laser disk player commercially plurality of separate players. Due to the present low 
available from several sources. The laser disk player cost of disk players, the second alternative may be pref- 
could have an RS-232 or other suitable computer inter- erable. 

face for coupling to the processor 12, and the disk It should also be noted that, in either type of system 
player 28 can then be accessed and controlled by the 35 employing multiple disks, it may be preferable to dedi- 
processor 12. For search and retrieval, the operator cate entire disks to index information rather than panic- 
would enter via keyboard 20 a key word to be searched, ular sections of each disk. 

and the processor would then search the index informa- With regard to the optical disk recording, the em- 
tion on the optical storage media to determine the disk bodiment of the invention reduced to practice employs 
location of any documents responsive to the key word 40 the translator 22, 1" C tape 24 and disk mastering station 
inquiry. These locations would be sent back to the pro- 26. This requires that the translator perform D/A con- 
cessor 12, which would then control the disk player to version when providing the information from the pro- 
access each of the locations and retrieve the image data. cessor 12 to the tape 24, and it also requires that the 
The bit-mapped digital image data could then be dis- translator 22 perform A/D conversion when image 
played on a display (eg., a monitor) sufficiently high 45 information is read from the disk player 28 and pro- 
resolution to display not only retrieved text but also vided to the processor 12. However, optical disk re- 
detailed picture images, e.g., a vertical resolution of not cording technology has advanced to the point where 
less than approximately 1500 lines. Such monitors are the image data and index information could be recorded 
available from a number of sources, eg., DataCopy. If directly rather than via the tape 24 and disk mastering 
any of the retrieved documents are considered suffi- 50 station 26. This would also permit elimination of the 
ciently pertinent, a copy thereof, either text or video translator 22. Instead, the information from the proces- 
image, or both, can be reproduced on a typical graph- sor 12 could be provided directly to a digital optical 
ics-quality printer 18. disk recorder 30, recorded in digital format, and played 

In one example of an application of the system of this back through a digital disk player 32. However, present 
invention, a large quantity of documents would be sue- 55 optical disk recorders are somewhat expensive and may 
cessively "photographed" by the camera 10, and the not yet be justifiable from a cost standpoint 
digital information would be stored in buffer storage 14. In the preferred embodiment described above, the 
After storing a sufficient quantity of image data in the processor 12 is provided with appropriate software to 
buffer storage 14, the processor 12 would provide the generate ASCII code from bit-mapped image data, so 
digital data for recording on high density storage media, 60 that the indexing information could be automatically 
eg., either through translator 22 and tape 24 to the generated with minimal operator input In those cases, 
optical disk mastering station 26 or directly to a digital such as drawings or unreadable text, where the proces- 
optical disk recorder. If documents are to be added to sor 12 cannot generate its own indexing information, the 
the system at a relatively slow rate, it may be satisfac- necessary information could be entered via keyboard 
tory to perform the text recognition substantially simul- 65 20. In the case of unreadable text it would normally be 
taneously with the capturing of the image information sufficient merely to enter a short description of the 
by the digital camera 10. However, if large numbers of documents being recorded. However, in those instances 
documents are to be captured and indexed, the most where it is desirable to enter a substantial portion or all 
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the processor 12 via inftS.irS L!^ ^ mt ° modifications can be made to the specific embodiment 

(Ca^STS^ Closed above without oepartmg^roTthe spirit and 

with software for recogn^nT^ntnt 5 c ^° f mVeati0n * defined » < he 1*— 

provided from the camera 10, an OCR 34, eg., ofThe T£;«. 

type available from KurzweU Computer Products, Inc., a . , . 

185 Albany Steet, Cambridge Massachusetts, or DEST An archival document image storage and retrieval 

Corporation, 1201 Cadillac Court, Milpitas. California, 5yStCm , sto 5 ms ^ "formation representing the 

can be used to read and recognize all or a substantial 10 P° ntent . of a P !uraurv of archival documents, maintain- 

portion of the documents being stored. m& . an . mtlei of descriptive terms and documents, and 

It should be noted that the OCR is not used as a rctricvin 8 stored image information corresponding to a 

substitute for the digital camera, but rather only as a P articular document in accordance with said index 

substitute for recognition software in the processor 12 upcn ldcn tification of a corresponding descriptive term, 

which would otherwise be used to recognize the textual 15 ^ system comprising: 

content of the digital camera output In other words, a camera for generating electronic digital output 

die digital camera 10 a used in all instances to generate signal representing the image of a document; 

digital image information representing the document dtta processing means for providing a processor out- 

and this digital image information is eventually stored P ut representing both said image and index infor- 

on the optical disk or other high density storage me- 20 nation corresponding to said document, said data 

mum. The information generated by the recognition processing means including means for receiving 

software m the processor 12, by the OCR 34, or by storing said digital output signals from said 

manual entry via keyboard 20, is used solely for index- camera representing a plurality of document im- 

mg purposes. ages, means for retrieving said stored images from 

for large drawings, e.g.. blueprints, it would be possi- 25 said means for storing after said digital signals rer> 

„^°h^ /h . ^ data via a digitizer/plotter 36 resenting said plurality of document images have 

providing a digital bit-map output corresponding to the been stored, means for recognizing the content of 

£?£tEK cX" of ffSSfT. ' U fr ° m the ^-ed from sTme^ 

ttZ^ ^^^^^ 

^^^^^^TTc^ 35 2 ^ge storage and retrieval 

manner. F^rX^^^^Z^^ Si^^ ^ * ^ 

to be accessed from remote locations by researchers and SE^ST providmg a further input to said 

scholars around the world. In ^ZTT^Z d^STTf "T* the of a 

ceived from a remote location. T^uSr dSent 40 T^u Whm ? ^ Processing means 

which may be text or a d^gVc^uM be^mT S furthcr Prudes means for generating said index infer- 

modem for remote display or remote printing "T? m accordanas fur *« input. 

System throughput may be enhanced using commer- archival document image storage and retrieval 

daily available automated document positioning sys- Sy8tCm 93 defined m claim 2, wherein said processor 
terns to permit rapid scanning of successive dcounents, 43 m P ut A mean » comprises a keyboard, 

in which case it would also be desirable to equip the archival document image storage and retrieval 

camera 10 with an autofocus capability available from Systcm M defined in claim 3, wherein said processor 

the camera manufacturer. in P ut means comprises an optical character recognizer. 

The system may also be enchanccd by providing one 5 ' archival document usage storage and retrieval 
or more portable data capture systems comprising a 50 8ystem defined in claim 4, wherein said digital optical 

camera, a portable processor having a removable hard 5tora 8 e means comprises at least one optical disk for 

disk, and a simple display such as a flat panel plasma slorin g said processor output 

display. The image data could then be captured and 6 An archival document image storage and retrieval 

transferred at a later date to the optical disk storage. system as defined in claim 5, wherein said digital optical 

The document storage and retrieval system described 55 stora ge means comprises a plurality of optical disks, 

above provides enchanced storage capacity while sim- 7 - An archival document image storage method for 

pufying the indexing and improving accessibility to the storing image information representing the content of a 

stored information. The use of the digital camera 10 plurality of documents, maintaining an index of descrip- 

results in digitizing image information, making it possi- rive terms and documents for subsequent retrieval of 

we to process the image information for recognition of 60 stored image information corresponding to a particular 

^ n ^^ tran ^,u b °? UXt and ^^S* to rcmotc document in accordance with said index upon identifi- 

locations and print the documents on available graph- cation of a corresponding descriptive term said method 

.csKjuahty printers. Most importantly, the system pro- including the stepVof: 

vides for automated storage and retrieval of documents generating with a camera a succession of electronic 
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after said succession of digital output signals have 
been stored, reading said succession of digital out- 
put signals from said first storage medium; 

receiving said read digital signals in a data processing 
m e ans , recognizing the content of at least a portion 5 
of said digital output signals from said first storage 
medium and producing recognition results, gener- 
ating index information corresponding to each 
document in accordance with said recognition 
results and providing a processor output represent- 10 
ing both the image of each document and the gen- 



erated index information corresponding to each 
document; and 
optically recording said processor output 

8. A method according to claim 7, wherein said step 
of optically recording said processor output comprises 
optically recording said processor output on a second 
storage medium different from said first storage me- 
dium. 

9. A method according to claim 8, wherein said first 

and second storage media are both optical disks. 
* • • * * 
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